Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allow to append comment to query #709

Merged

Conversation

dkuku
Copy link
Contributor

@dkuku dkuku commented Oct 11, 2024

Recently there was introduced commit_comment to prepend comment before the commit keyword
I wanted to propose something similar for all queries - It will be useful for prepending sqlcommenter comments to queries but any other info is possible.
I tried different hacks to get around this limitation but this has to be implemented in postgrex just before the query is sent to the database.
I found the place where to add this code by trial and error, any suggestion welcome how to make it better.
Also is it possible to test it somehow? Currently I'm logging it in the database but it would be nice to log the data that is sent to the database. Maybe testing the encoder is enough?

2024-10-11 21:44:02.715 BST,"kuku","postgrex_test",624085,"127.0.0.1:55774",67098e12.985d5,1,"SELECT",2024-10-11 21:44:02 BST,3/2286,0,LOG,00000,"execute <unnamed>: /*comment*/
SELECT 321",,,,,,,,,"","client backend",,0  

@josevalim
Copy link
Member

Unfortunately this is not going to work based on how most users use Ecto+Postgres because the queries are prepared: which means the query is only sent once to the database, the first time around. The reason why it works with BEGIN is because it is a textual command that is small and it is sent every time.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 12, 2024

Thanks, I did not know that. What about adding a mode that allows to send the whole query when there is a comment? Libraries in other languages send the whole sql afaik - python example, there is more in other languages in this repo.

@josevalim
Copy link
Member

We can do that and say that passing a comment automatically disables query caching, but then I beleive the implementation here has to be different. I would generally try to deal with it at the protocol level though, instead of changing the binary payloads.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 12, 2024

I'm happy to work on that, I just need to better understand how postgrex works.

@josevalim
Copy link
Member

You probably want to extract out the comment only in this branch: https://github.com/elixir-ecto/postgrex/pull/709/files#diff-0da854f0c1cda9486d776c72ecda6a2e595a7667b72688669bbd80d6b80f0f96R354 and pass the comment as argument to parse_describe_flush after the query. You can adjust other callers to pass nil as the comment.

And, on Postgrex module, if there is a comment, you set postgrex_prepare to false.

@dkuku dkuku force-pushed the allow_to_prepend_query_with_comment branch from 5aee3aa to be74cce Compare October 12, 2024 09:33
@dkuku
Copy link
Contributor Author

dkuku commented Oct 12, 2024

I'm considering whether it should be named differently, like non_cacheable_comment. That way, we could also introduce a cacheable_comment in the future, which could include static values like the module and function name. This would offer a good compromise by making it easier to locate where the query is defined (since we have access to that information using Ecto macros, then it could be added in ecto_sql).
Idea from here:
https://www.adyen.com/knowledge-hub/injecting-metadata-as-comments

end
end

@tag :big_binary
Copy link
Contributor Author

@dkuku dkuku Oct 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes problems when I log all queries in postgres - this inserts all the data in the log file, I need to have a way to skip it.

@greg-rychlewski
Copy link
Member

Sorry if i'm way off the mark, I don't know if I completely understand the ask. But would it work to add prepare: :unnamed to the repo operations instead? That way Postgres is not caching the query. We allow it to be specified per operation so you can pick where it's used.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 13, 2024

When we have options to skip query caching would it be enough to add the comment in the query function where we call iodata_to_binary? instead of modifying the protocol file ?

lib/postgrex.ex Outdated Show resolved Hide resolved
lib/postgrex.ex Outdated Show resolved Hide resolved
lib/postgrex.ex Outdated Show resolved Hide resolved
lib/postgrex/protocol.ex Outdated Show resolved Hide resolved
lib/postgrex/protocol.ex Outdated Show resolved Hide resolved
@josevalim
Copy link
Member

I have dropped some nitpicks. It generally looks good to me :)

@dkuku dkuku force-pushed the allow_to_prepend_query_with_comment branch from 1878393 to f20eec9 Compare October 13, 2024 18:16
@dkuku dkuku force-pushed the allow_to_prepend_query_with_comment branch from f20eec9 to 78a8069 Compare October 13, 2024 18:17
@dkuku dkuku changed the title allow to prepend query with comment allow to append comment to query Oct 13, 2024
@dkuku
Copy link
Contributor Author

dkuku commented Oct 14, 2024

After thinking abount it I wonder if this is exactly what I need.
My initial idea was to append the comment after the query cache - I was under the impression that it can be done in postgrex but I was wrong.
If the query can't be cached then it may be a better idea to append it in the query builder and from there mark the query as non cacheable in case the comment is dynamic or leave it as is when the comment is static (caller module and function name may be enough for most use cases).
My concern is that disabling the query cache can increase database cpu load - it may not be a problem in small projects but in bigger it may be a concern. Datadog library for example does trace sampling - maybe there is a way to know if a trace will be sampled and only include the dynamic parts on demand otherwise allow just to tag query with the basic info and have caching working.
I can prepare an rfc how I imagine it and get feedback on ecto mailing list and the observability group.

@josevalim
Copy link
Member

If the query can't be cached then it may be a better idea to append it in the query builder and from there mark the query as non cacheable in case the comment is dynamic or leave it as is when the comment is static (caller module and function name may be enough for most use cases).

Given queries are composable, which module and function would you use? If you want to use the module and function from the repository, then it is somewhat dynamic, but it is already information that we compute anyway, so you are right we could tackle it at the builder.

@@ -1593,6 +1606,16 @@ defmodule Postgrex.Protocol do
transaction_error(s, postgres)
end

defp parse_describe_comment_msgs(query, comment, tail) when is_binary(comment) do
statement = query.statement <> "/*#{comment}*/"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we may no longer go wit hthis PR, but I wonder if this could be dealt as IO lists:

Suggested change
statement = query.statement <> "/*#{comment}*/"
statement = [query.statement, "/*", comment, "*/"]

@dkuku
Copy link
Contributor Author

dkuku commented Oct 22, 2024

So how do we proceed with that pr?
I think it would be beneficial to still have a way to force the prepare
Repo.all(query, comment: "repo:MyApp.Repo,app:Myapp.Web", prepare: true)

false ->
comment = Keyword.get(opts, :comment)

if is_binary(comment) && String.contains?(comment, "*/") do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_binary(comment) && String.contains?(comment, "*/") do
if is_binary(comment) and String.contains?(comment, "*/") do

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this validation to the client. There is no need to crash the connection on bad argument. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So should I just remove the whole if here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should move it to the client, to the Postgrex module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it there in the query function.
In theory the comment could be also an iolist but Currently we are getting CaseClauseError.
I think this is hard to validate for sql injection unless we first dump it to a string.

Copy link
Member

@josevalim josevalim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added one tiny nitpick and we can ship it!

@dkuku dkuku requested a review from josevalim October 23, 2024 17:26
@josevalim josevalim merged commit 4971a27 into elixir-ecto:master Oct 24, 2024
9 checks passed
@josevalim
Copy link
Member

💚 💙 💜 💛 ❤️

@josevalim
Copy link
Member

Thank you for leading this. I am sure people will appreciate this feature. Please give main a try in conjuction with Ecto and let us know if it works as expected.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 24, 2024

Thanks, I will try it over the weekend.

@dkuku
Copy link
Contributor Author

dkuku commented Oct 27, 2024

We missed something when skipping prepare - I need to dig deeper into it, but that's what I currently have:
I created a repo module that has every function replaced with a macro.
Now with such module livebook:

defmodule W do
  require Test.Repo
  alias Test.Repo
  import Ecto.Query

  def insert do
    weather1 = %Weather{temp_lo: 0, temp_hi: 23}
    Repo.insert!(weather1)
  end

  def get_query do
    values = [%{id: 1, name: "Zabrze"}, %{id: 2, name: "Dudley"}]
    Weather
    |> join(:inner, [w], w1 in values(values, %{id: :integer, name: :string}), on: w.id == w1.id)
    |> select([w, c], %Weather{id: w.id, city: c.name})
    |> Repo.all()
  end

  def get_schema do
    Weather
    |> select([w], %Weather{id: w.id})
    |> where([w], not is_nil(w.id))
    |> Repo.all()
  end
end

And calling it one after one, I'm getting the comment only on one of the queries:

2024-10-27 07:24:07.266 GMT,"postgres","postgres",192987,"127.0.0.1:43910",671dea97.2f1db,1,"INSERT",2024-10-27 07:24:07 GMT,8/7,0,LOG,00000,"execute ecto_insert_weather_0: INSERT INTO ""weather"" (""temp_lo"",""temp_hi"",""prcp"",""inserted_at"",""updated_at"") VALUES ($1,$2,$3,$4,$5) RETURNING ""id""","parameters: $1 = '2', $2 = '23', $3 = '0', $4 = '2024-10-27 07:24:07', $5 = '2024-10-27 07:24:07'",,,,,,,,"","client backend",,0
2024-10-27 07:24:07.275 GMT,"postgres","postgres",192984,"127.0.0.1:43882",671dea97.2f1d8,1,"SELECT",2024-10-27 07:24:07 GMT,9/7,0,LOG,00000,"execute <unnamed>: SELECT w0.""id"", v1.""name"" FROM ""weather"" AS w0 INNER JOIN (VALUES ($1::bigint,$2::varchar),($3::bigint,$4::varchar)) AS v1 (""id"",""name"") ON w0.""id"" = v1.""id""/*app='test_app',function='get_query%2F0',line='18',module='W',owner='team_c'*/","parameters: $1 = '1', $2 = 'Zabrze', $3 = '2', $4 = 'Dudley'",,,,,,,,"","client backend",,0
2024-10-27 07:24:07.278 GMT,"postgres","postgres",192981,"127.0.0.1:43868",671dea97.2f1d5,1,"SELECT",2024-10-27 07:24:07 GMT,4/6,0,LOG,00000,"execute ecto_7179: SELECT w0.""id"" FROM ""weather"" AS w0 WHERE (NOT (w0.""id"" IS NULL))",,,,,,,,,"","client backend",,0

as you see only one is unnamed.
When I add the option prepare: :unnamed to default_options then the second select is unnamed but the insert still has a name.

@josevalim
Copy link
Member

josevalim commented Oct 27, 2024

@dkuku in Ecto, we pass a cache_statement to Postgrex.query, and I think that's overriding the prepare: false. Around here: https://github.com/elixir-ecto/postgrex/blob/master/lib/postgrex.ex#L303

We should probably ignore the cache statement if a comment is given.

Also, you shouldn't need to replace all Ecto.Repo functions. We already support stacktrace: true as part of the Repo configuration. If you set it to true, you can override def prepare_options (or a similar callback) to read the :stacktrace option and use it to build the comment option. :)

@dkuku
Copy link
Contributor Author

dkuku commented Oct 27, 2024

This is getting a bit complicated, I have to pass the comment around too many function and still it is not shown everywhere.
I think much easier would be to append it to the statement asap and just properly validate that it is not prepared - for example by setting prepare: :unnamed?

I have also a question about the stacktrace - do you have any suggestion:
When i use Ecto.Repo the stacktrace that I see when inspecting in prepare_query has not much info about the code I'm running. There is only the line where use Ecto.Repo - nothing about my code.

{:all,
 #Ecto.Query<from w0 in Weather, where: not is_nil(w0.id),
  select: %Weather{id: w0.id}>,
 [
   stacktrace: [
     {Ecto.Repo.Supervisor, :tuplet, 2, [file: ~c"lib/ecto/repo/supervisor.ex", line: 179]},
     {Test.Repo, :all, 2, [file: ~c"livemd/ecto.livemd#cell:pwz2dlcaejzj2duk", line: 2]},
     {:elixir, :eval_external_handler, 3, [file: ~c"src/elixir.erl", line: 386]},
     {:erl_eval, :do_apply, 7, [file: ~c"erl_eval.erl", line: 904]},
     {:elixir, :eval_forms, 4, [file: ~c"src/elixir.erl", line: 364]},
     {Module.ParallelChecker, :verify, 1, [file: ~c"lib/module/parallel_checker.ex", line: 112]},
     {Livebook.Runtime.Evaluator, :"-eval/4-fun-0-", 3, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 635]},
     {Code, :with_diagnostics, 2, [file: ~c"lib/code.ex", line: 621]},
     {Livebook.Runtime.Evaluator, :eval, 4, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 629]},
     {Livebook.Runtime.Evaluator, :continue_do_evaluate_code, 6, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 437]},
     {Livebook.Runtime.Evaluator, :loop, 1, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 337]},
     {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}
   ]
 ]}

I gave up and went back to the module I made. Then the stacktrace looks like i expect - the module is on the second line. I tested it like 20 times switching between my implementation and Ecto.

{:all,
 #Ecto.Query<from w0 in Weather, where: not is_nil(w0.id),
  select: %Weather{id: w0.id}>,
 [
   stacktrace: [
     {Ecto.Repo.Supervisor, :tuplet, 2, [file: ~c"lib/ecto/repo/supervisor.ex", line: 179]},
     {WeatherContext, :get_schema, 0, [file: ~c"livemd/ecto.livemd#cell:4h52ky3654zcawdu", line: 32]},
     {:elixir, :eval_external_handler, 3, [file: ~c"src/elixir.erl", line: 386]},
     {:erl_eval, :do_apply, 7, [file: ~c"erl_eval.erl", line: 904]},
     {:elixir, :eval_forms, 4, [file: ~c"src/elixir.erl", line: 364]},
     {Module.ParallelChecker, :verify, 1, [file: ~c"lib/module/parallel_checker.ex", line: 112]},
     {Livebook.Runtime.Evaluator, :"-eval/4-fun-0-", 3, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 635]},
     {Code, :with_diagnostics, 2, [file: ~c"lib/code.ex", line: 621]},
     {Livebook.Runtime.Evaluator, :eval, 4, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 629]},
     {Livebook.Runtime.Evaluator, :continue_do_evaluate_code, 6, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 437]},
     {Livebook.Runtime.Evaluator, :loop, 1, [file: ~c"lib/livebook/runtime/evaluator.ex", line: 337]},
     {:proc_lib, :init_p_do_apply, 3, [file: ~c"proc_lib.erl", line: 329]}
   ]

The code looks like this:
image

@josevalim
Copy link
Member

I mentioned this in my previous comment:

Also, you shouldn't need to replace all Ecto.Repo functions. We already support stacktrace: true as part of the Repo configuration. If you set it to true, you can override def prepare_options (or a similar callback) to read the :stacktrace option and use it to build the comment option. :)

It doesn't work? That should alleviate the need to pass comments. You should only need to do it on prepare_query/prepare_options/etc.

I have also a question about the stacktrace - do you have any suggestion

Yes, don't use the shell/livebook. Those are evaluated. Stacktrace is precise on compiled code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants